Spectro-temporal modulations for robust speech emotion recognition
نویسندگان
چکیده
Speech emotion recognition is mostly considered in clean speech. In this paper, joint spectro-temporal features (RS features) are extracted from an auditory model and are applied to detect the emotion status of noisy speech. The noisy speech is derived from the Berlin Emotional Speech database with added white and babble noises under various SNR levels. The clean train/noisy test scenario is investigated to simulate conditions with unknown noisy sources. The sequential forward floating selection (SFFS) method is adopted to demonstrate the redundancy of RS features and further dimensionality reduction is conducted. Compared to conventional MFCCs plus prosodic features, RS features show higher recognition rates especially in low SNR conditions. Index Term: Emotion recognition, robust, spectro-temporal modulations
منابع مشابه
Spectro-temporal Modulations for Robust Speech Emotion Recognition Spectro-temporal Modulations for Robust Speech Emotion Recognition
متن کامل
Methods for capturing spectro-temporal modulations in automatic speech recognition
Psychoacoustical and neurophysiological results indicate that spectro-temporal modulations play an important role in sound perception. Speech signals, in particular, exhibit distinct spectro-temporal patterns which are well matched by receptive fields of cortical neurons. In order to improve the performance of automatic speech recognition (ASR) systems a number of different approaches are prese...
متن کاملAuditory motivated front-end for noisy speech using spectro-temporal modulation filtering.
The robustness of the human auditory system to noise is partly due to the peak preserving capability of the periphery and the cortical filtering of spectro-temporal modulations. In this letter, a robust speech feature extraction scheme is developed that emulates this processing by deriving a spectrographic representation that emphasizes the high energy regions. This is followed by a modulation ...
متن کاملSpectro-temporal directional derivative features for automatic speech recognition
We introduce a novel spectro-temporal representation of speech by applying directional derivative filters to the Melspectrogram, with the aim of improving the robustness of automatic speech recognition. Previous studies have shown that two-dimensional wavelet functions, when tuned to appropriate spectral scales and temporal rates, are able to accurately capture the acoustic modulations of speec...
متن کاملNeural Responses to Speech-Specific Modulations Derived from a Spectro-Temporal Filter Bank
This paper analyzes the application of methods developed in automatic speech recognition (ASR) to better understand neural activity measured with electrocorticography (ECoG) during the presentation of speech. ECoG data is collected from temporal cortex in two subjects listening to a matrix sentence test. We investigate the relation of ECoG signals and acoustic speech that has been processed wit...
متن کامل